PubChem3D: Shape compatibility filtering using molecular shape quadrupoles
نویسندگان
چکیده
BACKGROUND PubChem provides a 3-D neighboring relationship, which involves finding the maximal shape overlap between two static compound 3-D conformations, a computationally intensive step. It is highly desirable to avoid this overlap computation, especially if it can be determined with certainty that a conformer pair cannot meet the criteria to be a 3-D neighbor. As such, PubChem employs a series of pre-filters, based on the concept of volume, to remove approximately 65% of all conformer neighbor pairs prior to shape overlap optimization. Given that molecular volume, a somewhat vague concept, is rather effective, it leads one to wonder: can the existing PubChem 3-D neighboring relationship, which consists of billions of shape similar conformer pairs from tens of millions of unique small molecules, be used to identify additional shape descriptor relationships? Or, put more specifically, can one place an upper bound on shape similarity using other "fuzzy" shape-like concepts like length, width, and height? RESULTS Using a basis set of 4.18 billion 3-D neighbor pairs identified from single conformer per compound neighboring of 17.1 million molecules, shape descriptors were computed for all conformers. These steric shape descriptors included several forms of molecular volume and shape quadrupoles, which essentially embody the length, width, and height of a conformer. For a given 3-D neighbor conformer pair, the volume and each quadrupole component (Qx, Qy, and Qz) were binned and their frequency of occurrence was examined. Per molecular volume type, this effectively produced three different maps, one per quadrupole component (Qx, Qy, and Qz), of allowed values for the similarity metric, shape Tanimoto (ST) ≥ 0.8.The efficiency of these relationships (in terms of true positive, true negative, false positive and false negative) as a function of ST threshold was determined in a test run of 13.2 billion conformer pairs not previously considered by the 3-D neighbor set. At an ST ≥ 0.8, a filtering efficiency of 40.4% of true negatives was achieved with only 32 false negatives out of 24 million true positives, when applying the separate Qx, Qy, and Qz maps in a series (Qxyz). This efficiency increased linearly as a function of ST threshold in the range 0.8-0.99. The Qx filter was consistently the most efficient followed by Qy and then by Qz. Use of a monopole volume showed the best overall performance, followed by the self-overlap volume and then by the analytic volume.Application of the monopole-based Qxyz filter in a "real world" test of 3-D neighboring of 4,218 chemicals of biomedical interest against 26.1 million molecules in PubChem reduced the total CPU cost of neighboring by between 24-38% and, if used as the initial filter, removed from consideration 48.3% of all conformer pairs at almost negligible computational overhead. CONCLUSION Basic shape descriptors, such as those embodied by size, length, width, and height, can be highly effective in identifying shape incompatible compound conformer pairs. When performing a 3-D search using a shape similarity cut-off, computation can be avoided by identifying conformer pairs that cannot meet the result criteria. Applying this methodology as a filter for PubChem 3-D neighboring computation, an improvement of 31% was realized, increasing the average conformer pair throughput from 154,000 to 202,000 per second per CPU core.
منابع مشابه
PubChem3D: conformer ensemble accuracy
UNLABELLED BACKGROUND PubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D) structures of small molecules. All the tools and services that are a part of PubChem3D rely upon the quality of the 3-D conformer models. ...
متن کاملPubChem3D: Diversity of shape
BACKGROUND The shape diversity of 16.4 million biologically relevant molecules from the PubChem Compound database and their 1.46 billion diverse conformers was explored as a function of molecular volume. RESULTS The diversity of shape space was investigated by determining the shape similarity threshold to achieve a maximum on the count of reference shapes per unit of conformer volume. The rat...
متن کاملSome Insights into the Simultaneous Forward and Backward Whirling of Rotors
One of the greatest advantages of using complex coordinates in the study of rotating machinery is the gain of physical insight into the forward and backward precessional modes. Together with the complex modal analysis formulation, other interesting tools (like the Shape and Directivity Index Plot) were developed to characterize the shape and directivity of the whirl modes. These tools are helpf...
متن کاملEffect of Alumina Nanoparticles on the Enhancement of Shape Memory, Mechanical and Impact Properties of TPU/ABS blend
In this paper, the shape memory, mechanical and Izod impact properties of a new shape memory nanocomposite based on thermoplastic polyurethane (TPU), acrylonitrile butadiene styrene (ABS) and alumina nanoparticles were investigated. The morphological results showed that the presence of 1% alumina nanoparticles made a reduction in diameter of ABS domains and caused a uniform distribution of the ...
متن کاملPeak Extraction and Partial tracking of Music signals using Kalman filtering
In this paper we propose a partial tracking method for music signals based on Kalman filtering. We first introduce a novel technique for detection of peaks in spectral representations of music signals. We also introduce different evolution models for our Kalman filter based on the shape of frequency and power partials in different classes of melodic instruments. Parameters of these models are e...
متن کامل